Ensembles of (α)-Trees for Imbalanced Classification Problems

نویسندگان

  • Yubin Park
  • Joydeep Ghosh
چکیده

This paper introduces two kinds of decision tree ensembles for imbalanced classification problems, extensively utilizing properties of α-divergence. First, a novel splitting criterion based on α-divergence is shown to generalize several wellknown splitting criteria such as those used in C4.5 and CART. When the α-divergence splitting criterion is applied to imbalanced data, one can obtain decision trees that tend to be less correlated (α-diversification) by varying the value of α. This increased diversity in an ensemble of such trees improves AUROC values across a range of minority class priors. The second ensemble uses the same alpha trees as base classifiers, but uses a lift-aware stopping criterion during tree growth. The resultant ensemble produces a set of interpretable rules that provide higher lift values for a given coverage, a property that is much desirable in applications such as direct marketing. Experimental results across many class-imbalanced datasets, including BRFSS, and MIMIC datasets from the medical community and several sets from UCI and KEEL, are provided to highlight the effectiveness of the proposed ensembles over a wide range of data distributions and of class imbalance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Model Trees and Their Ensembles for Imbalanced Data

Model trees are decision trees with linear regression functions at the leaves. Although originally proposed for regression, they have also been applied successfully in classification problems. This paper studies their performance for imbalanced problems. These trees give better results that standard decision trees (J48, based on C4.5) and decision trees specific for imbalanced data (CCPDT: Clas...

متن کامل

Cost-sensitive decision tree ensembles for effective imbalanced classification

Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on over...

متن کامل

New Ordering-Based Pruning Metrics for Ensembles of Classifiers in Imbalanced Datasets

The task of classification with imbalanced datasets have attracted quite interest from researchers in the last years. The reason behind this fact is that many applications and real problems present this feature, causing standard learning algorithms not reaching the expected performance. Accordingly, many approaches have been designed to address this problem from different perspectives, i.e., da...

متن کامل

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

ارائه‌روش جدید مبتنی‌بر برنامه‌نویسی ژنتیک برای وزن‌دهی قوانین فازی در طبقه‌بندی نامتوازن

In classification problems, we often encounter datasets with different percentage of patterns (i.e. classes with a high pattern percentage and classes with a low pattern percentage). These problems are called “classification Problems with imbalanced data-sets”. Fuzzy rule based classification systems are the most popular fuzzy modeling systems used in pattern classification problems. Rule weights...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Knowl. Data Eng.

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2014